Search CORE

370 research outputs found

Mining Protein Interaction Groups

Author: Lusheng Wang
Publication venue: 'IntechOpen'
Publication date: 30/03/2012
Field of study

IntechOpen

Computing the protein binding sites

Author: Guo Fei
Wang Lusheng
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Springer - Publisher Connector

PubMed Central

Linked region detection using high-density SNP genotype data via the minimum recombinant model of pedigree haplotype inference

Author: Wang Lusheng
Wang Zhanyong
Yang Wanling
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background With the rapid development of high-throughput genotyping technologies, efficient methods for identifying linked regions using high-density SNP genotype data have become more and more important. Recently, a deterministic method that works very well on SNP genotyping data has been developed (Lin et al. Bioinformatics 2008, 24(1): 86–93). However, that program can only work on a limited number of family structures. In particular, the results (if any) will be poor when the genotype data for the whole chromosome of one of the parents in a nuclear family is missing. Results We have developed a software package (LIden) for identifying linked regions using high-density SNP genotype data. We focus on handling the case where the genotype data for the whole chromosome of one of the parents in a nuclear family is missing. We use the minimum recombinant model for haplotype inference in pedigrees. Several local optimization algorithms are used to infer the haplotype of each individual and determine the linked regions based on the inferred haplotype data. We have developed a more flexible method to combine nuclear families to further refine (reduce the length of) the linked regions. Conclusion Our new package (LIden) is efficient software for linked region detection using high-density SNP genotype data. LIden can handle some important cases where the existing programs do not work well. In particular, the new package can handle many cases where the genotype data of one of the two parents is missing for the entire chromosome. The running time of the program is <it>O</it>(<it>mn</it>), where <it>m </it>is the number of members in the family and <it>n </it>is the number of SNP sites in the chromosome. LIden is specifically suitable for handling big sized families. This research also demonstrates another practical use of the minimum recombinant model for haplotype inference in pedigrees. The software package can be downloaded at <url>http://www.cs.cityu.edu.hk/~lwang/software/Link</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

An efficient algorithm for the blocked pattern matching problem

Author: Deng Fei
Liu Xiaowen
Wang Lusheng
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2015
Field of study

Motivation: Tandem mass spectrometry (MS) has become the method of choice for protein identification and quantification. In the era of big data biology, tandem mass spectra are often searched against huge protein databases generated from genomes or RNA-Seq data for peptide identification. However, most existing tools for MS-based peptide identification compare a tandem mass spectrum against all peptides in a database whose molecular masses are similar to the precursor mass of the spectrum, making mass spectral data analysis slow for huge databases. Tag-based methods extract peptide sequence tags from a tandem mass spectrum and use them as a filter to reduce the number of candidate peptides, thus speeding up the database search. Recently, gapped tags have been introduced into mass spectral data analysis because they improve the sensitivity of peptide identification compared with sequence tags. However, the blocked pattern matching (BPM) problem, which is an essential step in gapped tag-based peptide identification, has not been fully solved. Results: In this article, we propose a fast and memory-efficient algorithm for the BPM problem. Experiments on both simulated and real datasets showed that the proposed algorithm achieved high speed and high sensitivity for peptide filtration in peptide identification by database search

IUPUIScholarWorks

Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences

Author: Fu Bin
Kao Ming-Yang
Wang Lusheng
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/01/2009
Field of study

We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1g2 · · · gm is a string of m characters. Each background sequence is implanted into a probabilistically generated approximate copy of G. For an approximate copy b1b2 · · · bm of G, every character bi is probabilistically generated such that the probability for r

b_i\neq g_i

is at most

\alpha

. In this paper, we give the first analytical proof that multiple background sequences do help with finding subtle and faint motifs. This work is a theoretical approach with a rigorous probabilistic analysis. We develop an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large. Specifically, we prove that for α \u3c 0.1771 and any constant x ≥ 8, there exist constants t0, δ0, δ1 \u3e 0 such that if the length of the motif is at least δ0 log n, the alphabet has at least t0 characters, and there are at least δ1 log n0 input sequences, then in O(n3) time our algorithm finds the motif with probability at least 1 − 1 2x , where n is the longest length of any input sequence and n0 ≤ n is an upper bound for the length of the motif

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Better Practical Algorithms for rSPR Distance and Hybridization Number

Author: Chen Zhi-Zhong
Wang Lusheng
Yamada Kohei
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

The problem of computing the rSPR distance of two phylogenetic trees (denoted by RDC) is NP-hard and so is the problem of computing the hybridization number of two phylogenetic trees (denoted by HNC). Since they are important problems in phylogenetics, they have been studied extensively in the literature. Indeed, quite a number of exact or approximation algorithms have been designed and implemented for them. In this paper, we design and implement one exact algorithm for HNC and several approximation algorithms for RDC and HNC. Our experimental results show that the resulting exact program is much faster (namely, more than 80 times faster for the easiest dataset used in the experiments) than the previous best and its superiority in speed becomes even more significant for more difficult instances. Moreover, the resulting approximation programs output much better results than the previous bests; indeed, the outputs are always nearly optimal and often optimal. Of particular interest is the usage of the Monte Carlo tree search (MCTS) method in the design of our approximation algorithms. Our experimental results show that with MCTS, we can often solve HNC exactly within short time

Dagstuhl Research Online Publication Server